Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Word Searching in Document Images Using Word Portion Matching

Identifieur interne : 001832 ( Main/Exploration ); précédent : 001831; suivant : 001833

Word Searching in Document Images Using Word Portion Matching

Auteurs : Yue Lu [Singapour] ; Lim Tan [Singapour]

Source :

RBID : ISTEX:19435F2C25ADE5DC5D73CA65197979D6C31294B6

Abstract

Abstract: An approach with the capability of searching a word portion in document images is proposed in this paper, to facilitate the detection and location of the user-specified query words. A feature string is synthesized according to the character sequence in the user-specified word, and each word image extracted from documents are represented by a feature string. Then, an inexact string matching technology is utilized to measure the similarity between the two feature strings, based on which we can estimate how the document word image is relevant to the user-specified word and decide whether its portion is the same as the user-specified word. Experimental results on real document images show that it is a promising approach, which is capable of detecting and locating the document words that entirely match or partially match with the user-specified word.

Url:
DOI: 10.1007/3-540-45869-7_37


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Word Searching in Document Images Using Word Portion Matching</title>
<author>
<name sortKey="Lu, Yue" sort="Lu, Yue" uniqKey="Lu Y" first="Yue" last="Lu">Yue Lu</name>
</author>
<author>
<name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:19435F2C25ADE5DC5D73CA65197979D6C31294B6</idno>
<date when="2002" year="2002">2002</date>
<idno type="doi">10.1007/3-540-45869-7_37</idno>
<idno type="url">https://api.istex.fr/document/19435F2C25ADE5DC5D73CA65197979D6C31294B6/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000413</idno>
<idno type="wicri:Area/Istex/Curation">000406</idno>
<idno type="wicri:Area/Istex/Checkpoint">000F46</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Lu Y:word:searching:in</idno>
<idno type="wicri:Area/Main/Merge">001912</idno>
<idno type="wicri:Area/Main/Curation">001832</idno>
<idno type="wicri:Area/Main/Exploration">001832</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Word Searching in Document Images Using Word Portion Matching</title>
<author>
<name sortKey="Lu, Yue" sort="Lu, Yue" uniqKey="Lu Y" first="Yue" last="Lu">Yue Lu</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Department of Computer Science, School of Computing National University of Singapore, 117543, Kent Ridge</wicri:regionArea>
<wicri:noRegion>Kent Ridge</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Singapour</country>
</affiliation>
</author>
<author>
<name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Department of Computer Science, School of Computing National University of Singapore, 117543, Kent Ridge</wicri:regionArea>
<wicri:noRegion>Kent Ridge</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Singapour</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2002</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">19435F2C25ADE5DC5D73CA65197979D6C31294B6</idno>
<idno type="DOI">10.1007/3-540-45869-7_37</idno>
<idno type="ChapterID">37</idno>
<idno type="ChapterID">Chap37</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: An approach with the capability of searching a word portion in document images is proposed in this paper, to facilitate the detection and location of the user-specified query words. A feature string is synthesized according to the character sequence in the user-specified word, and each word image extracted from documents are represented by a feature string. Then, an inexact string matching technology is utilized to measure the similarity between the two feature strings, based on which we can estimate how the document word image is relevant to the user-specified word and decide whether its portion is the same as the user-specified word. Experimental results on real document images show that it is a promising approach, which is capable of detecting and locating the document words that entirely match or partially match with the user-specified word.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Singapour</li>
</country>
</list>
<tree>
<country name="Singapour">
<noRegion>
<name sortKey="Lu, Yue" sort="Lu, Yue" uniqKey="Lu Y" first="Yue" last="Lu">Yue Lu</name>
</noRegion>
<name sortKey="Lu, Yue" sort="Lu, Yue" uniqKey="Lu Y" first="Yue" last="Lu">Yue Lu</name>
<name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
<name sortKey="Tan, Lim" sort="Tan, Lim" uniqKey="Tan L" first="Lim" last="Tan">Lim Tan</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001832 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001832 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:19435F2C25ADE5DC5D73CA65197979D6C31294B6
   |texte=   Word Searching in Document Images Using Word Portion Matching
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024